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1. Introduction 

At the previous MPEG meeting, bidirectional video object planes (B-VOPs) based coding was added to the MPEG-4 Video 
VM. B-VOPs based coding combines the flexibility of MPEG- 1 B-pictures and bits savings efficiency of H.263 PB-frame (B- 
blocks) by allowing the choice of prediction in forward, backward, bidirectional and the direct modes, on a macroblock basis. 
Furthermore, B-VOPs, similar to I- and P-VOPs can be of arbitrary shape. B-VOPs, since they are noncausal (do not feedback - 
into the interframe coding loop) they can be easily separated to enable Temporal scalability, which is one of the very efficient 
forms of scalability. The core experiment Bl includes rectangular VOPs based Temporal Scalability while the core experiment 
CI includes arbitrary shaped VOPs based Temporal Scalability. 
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2. Temporal Scalability with B-VOPs 

Scalability involves use of two or more layers, for the purpose of experiments we are considering two two layer scalability 
only. The two layers are, a base-layer and an enhancement-layer. As is typical, the base-layer refers to the layer coded 
indpendentiy whereas the enhancement-layer is the layer coded dependently with respect to the base-layer. In addition to 
inter-frame prediction that is employed in nonscalable (single layer) coding, typical forms of scalability such as Temporal 




scalability and Spatial . scalability also use inter-layer prediction. Normally, one of the main differences between Spatial and 
Temporal scalability is that Spatial scalability does not use motion vectors for inter-layer predictin due to its temporal 
coincidence but Temporal scalability does. * . ! . 

As mentioned earlier, our experiment is on Temporal scalability with rectangular VO's and B-VOP coding; All of the B-VOPs 
belong to enhancement layer whereas base-layer carries I- and P-VOPs. In general, this is not a requirement for the proposed 
scalability, in fact, the MPEG-2 syntax on which our proposed scalability syntax is based allows use of I-, P- or Brpictures in 
the base-layer. In our experiments, the prediction structure employed for the base- and the enhancement-layers is shown in 
Figure 1. ; . 




Figure 1 Prediction Structure employed in two layer B-VOP based Temporal scalability 

Since, these experiments are intended for low bitrates in MPEG-4, the original video sequences at ciF or QCIF resolutions are 
temporally decimated by a factor of 2 before coding begins. Thus, the prediction structure of Figure 1 applies to the temporally 
decimated input sequence and not to the original input sequence, and further this temporally decimated sequence is 
demultiplexed to form two sequences, one input to the base-layer encoder and other to the enhancement-layer encoder. Our 
base-layer encoder is same as the VM2.2 encoder with or without B-VOPs, whereas the enhnacemerit-layer encoder uses B- 
VOPs only and interlayer predictions with respect to decoded base-layer VOPs. 

3. Experiment Results 

We now present the results of our experiments on Temporal scalability using I- and P-VOPs in the base layer and B-VOPs only 
in the enhancement layer. Two test conditions are employed, first, QCIF sequences coded at a total of 24 kbits/s, and second, 
CIF sequences coded at a total of 1 12 kbit/s. In both case, the frame rate of the base-layer is 5 frame/s and that for the 
enhancement layer is 10 frames/s; temporally multiplexing of base and enhancement layers result in 15 frames/s. The results of 
our experiments are shown in Table 1 for QCIF resolution and in Table 2 for CIF resolution. 



Table 1 Results of Temporal scalability experiments with QCIF sequences at a total of about 24 kbits/s 



Sequence 


Layer and 
frame rate 


VOP 
type 


QP 


SNRY 
dB 


SNRCb 
dB 


SNRCr 
dB 


Avg.Bits 
per VOP 


Avg. Bitrate 
kbits/s 


Akiyo 


Enh. layer @ 
10 frames/s 


B 


20 


3224 


34.09 


3635 


991 


951 


Base layer @ 
S frames/s 


1/P 


14.14 


3233 


34.14 


36.54 


1715 


8.58 


Silent • 


Enh. layer @ 
10 frames.s 


B 


25 


28.73 


33.77 


35.44 


1452 


14.52 


Base layer @ 
5 frames/s 


I/P 


19.02 


28.93 


33.80 


35.50 


2614 


13.07 



Mother & 


Fnh laver fSl 

1 #111 1* 1UJ VI \H4 

10 frames/s 


B 


19 


32.79 


3820 


38.89 


1223 


1223 


Daughter 


Base layer @ 
5 frames/s 


1/P 
I 


12.08 


33.05 


3832 


39.00 


2389 


11.95 


Container 


Enk layer @ 
10 frames/s 


B 


20 


30.02 


36.65 


35.75 


1138 


1138 




Base layer @ 
5 frames/s 


I/P 


14.14 


30.06 


36.64 


35.74 


2985 


14.93 



Table 2 Results of Temporal scalability experiments with CIF sequences at a total of about 1 12 kbits/s 



Sequence 


Layer and 
frame rate 


VOP 
type 


QP 


SNRY 


SNRCb 


SNR Cr 


Avg. Bits 


Avg. Bitrate 
kbits/s 


Akiyo 


Enh. layer @ 
10 frames/s 


B 


27 


32.85 


35,13 . . 


37.88 


4821 


4821 


Base layer @ 
5 frames/s 


I/P 


22.1 


32.85 


35.13 


37.88 


5899 


29.50 


Mother & 
Daughter 


Enh. layer @ 
10 frames/s 


B 


27 


32.91 


37.71 


38.12 


6633 


6633 


Base layer @ 
5 frames/s 


I/P 


22.1 


32,88 


37.86 


38.30 


8359 


41.80 


Silent 


Enh. layer @ 
10 frames. s 


B 


29 


29.07 


3435 


35.83 


5985 


59.85 


Base layer @ 
10 frames/s 


I/P • 


24.06 


29.14 . 


3436 


35.89 


9084 


45.42. 


Container 


Enh. layer @ 
10 frames/s 


B 


29 


28.47 


36.17 


35.50 


6094 


60.94 


Base layer @ 
5 frames/s 


1/P 


24.06 


28.52 


3621 


35.55 


8325 . 


41.63 


News 


Enh. layer @ 
10 frames/s 


B 


29 


29.84 


34.04 . 


35.78 


6197 


61.97 


Base layer @ 
5 frames/s 


I/P 


23.08 


29.70 


34.04 


35.76 


9710 


4835 


Coastguard 


Enh. layer @ 
10 frames/s 


B 


29 


26.64 


37.71 


40.86 


10739 


10739 


Base layer @ 
5 frames/s 


I/P 


24.06 


27.04 


37.83 


41.00 


14938 


74.69 


Foreman 


Enh. layer @ 
10 frames/s 


B 


29 


29.82 


3638 


36.87 


9864 


98.64 






Base layer @ 


If? 


24.12 


30.00 


36.42 \ 


36.87 


14459 / 


7229 




5 frames/s 

















4. Summary 

With in the context of core experiment Bl, we have experimented with Temporal scalability at low bit-rates using the MPEG-4 
VM tools and the new proposed syntax. In our experiments, rectangular VOPs (frames) were used. The base-layer consisted of 
an I-VOP followed by P-VOPs, whereas, the enhancement-layer consisted of B-VOPs only. One-third of the total coded frames 
belong to the base-layer, the remaining two-third belong to the enhancement-layer. Total bitrate is nearly equally split 
between the base- and enhancement-layers. 

. Results in Table 1 and 2 verify the "performance. of the proposed solution for Temporal scalability for MPEG-4 applications at 
low bi (rates; this solution is based on recently introduced B-VOPs and thus the performance of B-VOPs based coding is also 
verified. In addition, the proposed reorganized syntax structure which includes scalability was successfully used in this 
experiment (as well as in experiment CI) and is thus also verified. Further, some optimization of prediction modes for B-VOPs, . 
in context of scalability or. otherwise is possible. The proposed syntax for scalability also enables Spatial, scalability of 
rectangular or irregular shape VO's as a specific mode in generalized scalability. 



